Page-independent multi-field validation in document capture

ABSTRACT

Techniques to validate data in document capture are disclosed. An indication is received that a validation rule associated with two or more dependent fields in a data entry form comprising data values extracted from a multi-page document has failed. Ahuman validation interface is provided that enables an operator to view the affected dependent fields and for each an associated document image portion from which a corresponding data value was extracted, including by providing automated navigation to and display of the affected dependent fields.

BACKGROUND OF THE INVENTION

In document capture, typically pages are recognized and validated one ata time, in sequence. In the typical approach, each page is processedindependently with its own data entry form, value extraction, andvalidation. Values may be extracted from various places on a page andacross pages in the case of a multi-page document. There can be rulesthat determine the correctness of multiple values in the document. Ifthe validation rule fails, then one or more of the dependent fields maybe in error. The error may be incorrect optical character recognition(OCR), incorrect location of the value, incorrectly written data on theimage itself, etc. It is generally not possible to determine which fieldis in error, and what should be the corrective action, except by humaninspection. In current approaches, a human operator may have to navigatemanually, opening and closing different pages and associated forms,copying over data, etc., to compare or otherwise validate data fromdifferent pages in a multi-page document.

BRIEF DESCRIPTION OF THE DRAWINGS

Various embodiments of the invention are disclosed in the followingdetailed description and the accompanying drawings.

FIG. 1 is a flow chart illustrating an embodiment of a process tocapture data.

FIG. 2 is a block diagram illustrating an embodiment of a documentcapture system and environment.

FIG. 3 is a block diagram illustrating an embodiment of a documentcapture system.

FIG. 4 is a block diagram illustrating an embodiment of a datavalidation user interface.

FIG. 5 is a screen shot illustrating an embodiment of a technique tominimize eye strain and/or fatigue in manual indexing.

FIG. 6 is a flow chart illustrating an embodiment of a process tofacilitate manual indexing.

FIG. 7 is a block diagram illustrating an embodiment of a documentcapture system and process.

FIG. 8 is a block diagram illustrating an embodiment of an interface tovalidate a multi-page document.

FIG. 9A is a flow chart illustrating an embodiment of a process tocapture document data.

FIG. 9B is a flow chart illustrating an embodiment of a process tocapture document data.

FIG. 10 is a flow chart illustrating an embodiment of a process toperform validation of data values extracted from a multi-page document.

FIG. 11 is a flow chart illustrating an embodiment of a process toperform validation of data values extracted from a multi-page document.

FIG. 12 is a flow chart illustrating an embodiment of a process toperform validation of data values extracted from a multi-page document.

DETAILED DESCRIPTION

The invention can be implemented in numerous ways, including as aprocess; an apparatus; a system; a composition of matter; a computerprogram product embodied on a computer readable storage medium; and/or aprocessor, such as a processor configured to execute instructions storedon and/or provided by a memory coupled to the processor. In thisspecification, these implementations, or any other form that theinvention may take, may be referred to as techniques. In general, theorder of the steps of disclosed processes may be altered within thescope of the invention. Unless stated otherwise, a component such as aprocessor or a memory described as being configured to perform a taskmay be implemented as a general component that is temporarily configuredto perform the task at a given time or a specific component that ismanufactured to perform the task. As used herein, the term ‘processor’refers to one or more devices, circuits, and/or processing coresconfigured to process data, such as computer program instructions.

A detailed description of one or more embodiments of the invention isprovided below along with accompanying figures that illustrate theprinciples of the invention. The invention is described in connectionwith such embodiments, but the invention is not limited to anyembodiment. The scope of the invention is limited only by the claims andthe invention encompasses numerous alternatives, modifications andequivalents. Numerous specific details are set forth in the followingdescription in order to provide a thorough understanding of theinvention. These details are provided for the purpose of example and theinvention may be practiced according to the claims without some or allof these specific details. For the purpose of clarity, technicalmaterial that is known in the technical fields related to the inventionhas not been described in detail so that the invention is notunnecessarily obscured.

Page-independent multi-field validation in document capture isdisclosed. In various embodiments, a data-entry form that spans multiplepages is populated using values extracted from different pagescomprising a multi-page document. As a human operator indicates areadiness to resolve a next error or other validation task involvingmulti-field dependencies, an interface is provided that displays thefield(s) to be validated, for each a corresponding part of the pageimage from which it was extracted, and a representation of theapplicable validation rule that failed in automated validation. Avalidation rule can then be written that is independent from the valuelocation. The operator trying to fix the problem can focus upon thebusiness rule and the implicated fields and associated extracted values,without having to navigate manually through pages comprising thedocument and/or associated forms to find the other fields implicated bythe validation rule and/or their associated extracted values and/or pageimage portions.

FIG. 1 is a flow chart illustrating an embodiment of a process tocapture data. In the example shown, document content is captured into adigital format (102), e.g., by scanning the physical sheet(s) to createa scanned image. The document is classified (104). In some embodiments,classification includes detecting a document type corresponding to anassociated data entry form. Data is extracted from the digital content(106), for example through optical character recognition (OCR) and/oroptical mark recognition (OMR) techniques. Extracted data is validated(108). In various embodiments, validation may be performed at least inpart by an automated process, for example by comparing multipleoccurrences of the same value, by performing computations or othermanipulations based on extracted data, etc. In various embodiments, allor a subset of extracted values, e.g., those for which less than arequired degree of confidence is achieved through automated extractionand/or validation, may be validated manually, by a human indexer orother operator. Once all data has been validated, output is delivered(110), e.g., by storing the document image and associated data in anenterprise content management system or other repository.

FIG. 2 is a block diagram illustrating an embodiment of a documentcapture system and environment. In the example shown, a client system212 is attached to a scanner 204. Documents are scanned by scanner 204and the resulting document image is sent by the client system 212 todocument capture system 202 for processing, e.g., using all or part ofthe process of FIG. 1. In the example shown, document capture system 202uses a library of data entry forms 206 to create a structuredrepresentation of data extracted from a scanned document. For example,as in FIG. 1 steps 104 and 106, in some embodiments a document isclassified by type and an instance of a corresponding data entry form iscreated and populated with data values extracted from the documentimage. In some embodiments, data validation may be performed, at leastin part, by document capture system 202 by accessing external data 208via a network 210. For example, an external third party database thatassociates street addresses with correct postal zip codes may be used tovalidate a zip code value extracted from a document. In the exampleshown, validation may be performed at least in part by a plurality ofmanual indexers each using an associated client system 212 tocommunicate via network 210 with document capture system 202. Forexample, document capture system 202 may be configured to queue humanvalidation tasks and to serve tasks out to indexers using clients 212.Each client system 212 may use a browser based and/or installed clientsoftware provided functionality to validate data as described herein. Insome embodiments, once validation has been completed the resulting rawdocument image and/or form data are delivered as output, for example bystoring the document image and associated form data in a repository 214,such as an enterprise content management (ECM) or other repository.

FIG. 3 is a block diagram illustrating an embodiment of a documentcapture system. In the example shown, the document capture system 202 ofFIG. 2 is shown to receive document image data, e.g., via network 204from a scanning client system 212. Document image data is received insome embodiments in batches and is stored in an image store 308.Document image data is provided to a data extraction module 310 whichuses a data entry forms library 312 to classify each document by typeand create an instance of a type-specific data entry form. Dataextraction module 310 uses OCR, OMR, and/or other techniques to extractdata values from the document image and uses the extracted values topopulate the corresponding data entry form instance. In someembodiments, data extraction module 310 may provide a score or otherindication of a degree of confidence with which an extracted value hasbeen determined based on a corresponding portion of the document image.In some embodiments, for each data entry form field a correspondinglocation within the document image from which the data value entered bythe extraction module in that form field was extracted, for example theportion that shows the text to which OCR or other techniques wereapplied to determine the text present in the image, is recorded. In theexample shown, the data extraction module 310 provides the populatedform to a validation module 314 configured to perform validation(automated and/or human as configured and/or required). In someembodiments, the validation module 314 applies one or more validationrules to identify fields that may require a human operator to validate.In the example shown, the validation module 314 may communicate via acommunications interface 316, for example a network interface card orother communications interface, to obtain external data to be used invalidation and/or to generate and provide to human indexers viaassociated client systems, such as one or more of clients 212 of FIG. 2,tasks to perform human/manual validation of all or a subset of formfields. The validated data is provided to a delivery/output module 318configured to provide output via communication interface 316, forexample by storing the document image and/or extracted data (structureddata as capture using the corresponding data entry form) in anenterprise content management system or other repository.

FIG. 4 is a block diagram illustrating an embodiment of a datavalidation user interface. In the example shown, validation interface400 includes a document image display area 402, a data entry forminterface 404, and a navigation frame 406. A document image 408 isdisplayed in document image display area 402. In the example shown,portions of document image 408 that correspond to data entry form fieldsin the form shown in data entry form interface 404 are highlighted, asindicated in FIG. 4 by the cross-hatched rectangles in document image408 as shown. In this example, thumbnails are shown in navigation pane406, each corresponding for example to an associated document and/orpage from which data has been captured. In this example, the topmostthumbnail image as shown in navigation frame 406 of FIG. 4 ishighlighted (thicker outer outline), indicating that document image 408as displayed in document image display area 402 corresponds to thetopmost thumbnail. In some embodiments, controls are provided (e.g., onscreen controls, key stokes or combinations, etc.) to enable theoperator to pan, scroll, and/or zoom in/out with respect to the documentimage 408, for example to focus and zoom in on (magnify) a particularportion of the document image 408. In some embodiments, as the operatorvalidates each field a cursor advances to the next field and acorresponding portion of the document image 408 is highlighted.

FIG. 5 is a screen shot illustrating an embodiment of a technique tominimize eye strain and/or fatigue in manual indexing. In the exampleshown, partial screen shot 500 includes a portion of a manual datavalidation user interface that includes a data entry form field 502, inthis example with a current value of “888-555-1348” displayed, andnearby to the form field, as displayed in the data entry form portion ofthe data validation interface, a snippet 504 taken from a correspondingdocument image, which shows just the portion of the document image thatcontains the image of the text (in this case numerical values) extractedfrom the document to populate the form field 502. In this example, aconfirmation or other informational and/or error message 506 similarlyis displayed near the form field 502. As a result, the form field 502,corresponding snippet 504, and confirmation message 506 are all in theline of sight, or nearly so, at the same time, enabling all informationrequired to validate the value entered in the form field 502, includingentering any correction that may be required, to be viewed at the sametime and/or with minimal eye or head movement and without requiring theoperator to scan back and forth between the document image frame and thedata entry form, and/or to scroll, pan, or zoom in/out in the documentimage as viewed to locate and scale to a readable size the text to bevalidated. In some embodiments, the snippet 504 is scaled to ensurereadability, for example by including in the snippet only (or mostly)the text to be validated and scaling the image to a readable size, forexample until the image is of at least a prescribed minimum size and/orthe displayed characters are of a prescribed minimum “point” or othersize.

In some embodiments, as an operator finishes validation of a field,indicated for example by pressing the “enter” key or selecting anotherkey or on screen control, the system automatically pans to the next dataentry form field, retrieves and displays near the form field acorresponding document image snippet. In this way, the operator cannavigate through the form and corresponding portions of the documentimage without retargeting, i.e., without having to redirect their eyesto a different point or points on the screen.

FIG. 6 is a flow chart illustrating an embodiment of a process tofacilitate manual indexing. In various embodiments, the process of FIG.6 is used to provide an interface such as the one shown and describedabove in connection with FIG. 5. In the example shown in FIG. 6, asnippet containing the text or other document image portioncorresponding to a data entry form field to be validated is obtained,and an association between the snippet and/or the associated location inthe document image, on the one hand, and the corresponding form field,on the other hand, is stored (602). The snippet is scaled as/if need forreadability (604). The scaled (if applicable) snippet is displayedadjacent or otherwise near to the form field where correspondingextracted data to be validated is displayed and/or entered (606).

Typically, as noted above pages comprising a multiple page document havebeen processed separately, each page having its own correspondingelectronic data entry form associated with it. The per-page formapproach has a number of shortcomings. For example, a value (e.g., anaccount number on the footer of each page in an invoice) may occur inseveral pages. An error on a single page results in work for theoperator, because typically there is no framework to reconcile dataacross pages and to auto-correct data. In addition, in production, theoperator will only become aware of the problem when he navigates to thepage. If there are large discrepancies between values of many pages,then the operator must manually look at each page and that takes time.

In semi-structured and unstructured documents, there can be any numberof variations of pages. If the data-entry form is page-based, and aunique form per page is used, this results in an unmanageable number offorms. If a generic form that contains a union of possible fields isused, this results in forms with unused fields. This requires extra workto handle. Furthermore, if a value is copied from another page, itssource value and location typically is not shown because only thecurrent page, and not the page from which the copied value wasextracted, is shown. If the page is changed, it would result in thedata-entry form being changed. Changing the data-entry form thendisrupts the sequence of work, resulting in lower operator efficiency.

Under the form-per-page approach, when a table spans multiple pages, thetechnique of copying data between pages results in a large set ofduplicate values. Extra effort is then needed to synchronize if the usermakes any changes on any page. The navigation problem described above iscompounded. For example, suppose the sub-totals on a multi-page invoiceline items table do not add up. It is more cumbersome for the operatorto go through each page and then each table, and to work with duplicaterow values.

In content management systems, the metadata object is usually notdefined per page. To export per-page forms, effort must be made to mapvalues to their corresponding attributes of a metadata object used torepresent the multi-page document in the content management system.

In light of all the foregoing shortcomings of the per-page approach todocument capture as applied to multi-page documents, automatic detectionand processing of a multi-page document as a single document isdisclosed. In various embodiments, automatic detection of the pagescomprising a multiple page document is performed. Data values areextracted from the pages comprising the document and used to populate asingle electronic data entry form for the multi-page document. Theoperator can then go through the electronic data entry form, for exampleto validate data fields as required, and the document capture and/orvalidation system shows the location in the captured document of thecorresponding data, regardless of which page(s) it occurs in, ratherthan the operator having to find and/or choose each page, indexing eachindependently, and then reconcile later data that occurs in and/or spansmultiple pages.

FIG. 7 is a block diagram illustrating an embodiment of a documentcapture system and process. In the example shown, scanned pages 702,704, and 706 comprise a multi-page document. First page recognitionand/or other techniques are applied in various embodiments to detectautomatically the beginning and/or ending of a multi-page document suchas document. The pages 702, 704, and 706 are identified through aprocess 708 as comprising a single multi-page document. A correspondingdocument type is determined and data values are extracted from pages702, 704, and 706 to populate a single data entry form 710 configured tocapture data values extracted from the multi-page document. In theexample shown, the respective locations within the page images 702, 704,and 706 of data extracted to populate form 710 are shown as smallcross-hatched rectangles. The rows at the bottom of page 702 and the topof page 704 in this example comprise a single table, list, or otherarray that spans pages 702 and 704. The corresponding extracted datavalues are in some embodiments captured initially in page specificarrays, the rows of which are concatenated in the example shown topopulate the single table at the bottom of form 710.

FIG. 8 is a block diagram illustrating an embodiment of an interface tovalidate a multi-page document. In the example shown, the interface 800includes a page image display area 802, in which in the example shown animage of page 702 of FIG. 7 is shown. The interface 800 further includesa data entry form area 804, in this example corresponding to the form710 of FIG. 7. Thumbnails for the pages 702, 704, and 706 of FIG. 7 (notnumbered individually in FIG. 8) are displayed in navigation pane 806.In the example shown, the topmost thumbnail as displayed in navigationpane 806 is highlighted as being currently “selected” for display inpage image display area 802. In various embodiments, selection by ahuman operator of a thumbnail in navigation pane 806 results in an imageof the corresponding page being displayed in page image display area802. In some embodiments, as an operator navigates to different formfields in the form area 804 a corresponding portion or portions of themulti-page document, in one or more pages may be navigated toautomatically. For example, navigation to a first row of the threecolumn table at the bottom of the form in this example may in someembodiments cause the first page 702 to be displayed. Selection of acell in one of the bottom three rows, either manually or automaticallyas the system advances to a next field to be validated, in variousembodiments may cause the second page of the multi-page document, fromwhich the corresponding data was extracted in this example, to bedisplayed in the page image display area 802. In various embodiments,selection of a field in form area 804 results in a snippet of acorresponding portion of the page from which the data associated withthat field was extracted is determined, retrieved, and displayed, forexample in a location adjacent or nearly adjacent to the field, asdescribed above.

FIG. 9A is a flow chart illustrating an embodiment of a process tocapture document data. In the example shown, the beginning and/or end ofa multi-page document is/are detected (902). For example, knowntechniques to detect a first page may be used, and a multi-page documentmay be determined to have been encountered if one or more subsequentpages are scanned prior to a next “first” page is detected. A documenttype is determined and a corresponding data entry form instance iscreated (904). Scalar (single value) and array (tables, lists, or othertwo dimensional sets of data) data values are identified and extracted,for example using OCR, OMR, or other automated extraction techniques(906). Occurrences of the same and/or dependent values in multiplelocations, including across page boundaries, may be used to performautomated and/or manual validation (908). For example, a name thatappears at the beginning of a life insurance application and again in anattached report of a physical examination may be cross-checked todetermine the accuracy of data extraction from one or both of thelocations. Rows of arrays that span multiple pages are concatenated intoa single form table (910). Array values may be validated using the fulltable, including across page boundaries (912). For example, quantity andunit price fields may be multiplied and the result compared to a lineitem subtotal, subtotals in all rows (including potentially across pageboundaries) may be summed and compared to an extracted total, etc.

FIG. 9B is a flow chart illustrating an embodiment of a process tocapture document data. In the example shown, a library of metadatadocument types is defined, with each document type containing scalarfields and tables of array fields (922). Automatic page recognition isdone as in prior art, with page types determined (924). Values areextracted into per-page scalar and array fields by name, and eachfield's location on the page is saved (926). The multi-page documenttype is determined from an analysis of the stream of page types (928).Data from the component pages is automatically combined into thedocument type (930). A given named scalar field may occur on any page,or in multiple pages. Data validation is performed (932).

FIG. 10 is a flow chart illustrating an embodiment of a process toperform validation of data values extracted from a multi-page document.In the example shown, an indication is received that an operator is donevalidating a currently displayed data (1002), e.g., the “Date” field inthe example shown in FIG. 8. If no more fields remain to be validated(1004), the process ends. Otherwise, if the next field to be validatedis on the same page (1006) the next field in the data entry form isadvanced to and displayed, and a corresponding snippet or other portionof the current page, from which the associated data value to bevalidated was extracted, is displayed adjacent to the form field (1008).If the next form field requiring validation is associated with data froma different page of the multi-page document (1006), the systemautomatically retrieves or otherwise accesses the other page and/or anapplicable portion thereof (e.g., a corresponding snippet) (1010),transparently to the human operator, and the next form field and thecorresponding snippet obtained from the other page of the multi-pagedocument are displayed for validation (1008) transparently to andwithout requiring any further action by the human operator.

FIG. 11 is a flow chart illustrating an embodiment of a process toperform validation of data values extracted from a multi-page document.In the example shown, a definition of a library of validation rules isreceived (1102). Examples include, without limitation, a rule requiringthat a first value extracted from a named field A must match a secondvalue extracted from a named field B. Another example is a rulerequiring that a sum or other computation based on a specified set offields must equal a value extracted from another named field, e.g.,subtotals in an array must sum to equal a total. Document typedefinitions are received (1104). Each definition identifies validationrules to be applied, and as applicable a mapping to the document typefields to be used to apply each rule. An operator interface is providedthat facilitates multi-field validation, including across pageboundaries (1106). In some embodiments, for example, the interfaceenables an operator to iterate through just the dependent fields thatrequire validation. As the operator corrects and/or confirms the enteredvalue for a first dependent field, for example, and hits “enter”, thesystem advances automatically to display a next one of the dependentfields and its associate document image portion, from whichever page inwhich it may be located. The system iterates through the dependentfields until the operator enters data that clears the validation errorand/or there are no more dependent fields to be displayed. In variousembodiments, by combining data extracted from multiple page images of amulti-page documents into a single document type and associated dataentry form, automated and manual validation of dependent data fieldsthat occur on or across different pages is facilitated, withoutrequiring software code and/or human action to navigate between dataentry forms used to capture data extracted from individual pages.

FIG. 12 is a flow chart illustrating an embodiment of a process toperform validation of data values extracted from a multi-page document.In the example shown, an instance of a multi-page document type isreceived (1202). Applicable validation rules are evaluated, e.g.,sequentially, including those requiring the concurrent processing ofdata values extracted from different pages, and any dependent fields aremarked as having an error if a rule fails (1204). If during validationby a human operator (see, e.g., FIG. 10) a data value as extracted iscorrected, e.g., the human operator enters a corrected value in a formfield, validation rules affected by the change are re-evaluated, forexample to ensure a correction that satisfied a first rule did notintroduce an inconsistency that caused a second rule to not besatisfied. If the value for a field is visually confirmed with the pageimage to be correct, then the field can be flagged so it is henceforthno longer marked as having an error when a rule is re-evaluated (1204).In this way, operators can be more efficient by navigating only tounconfirmed fields.

In various embodiments, human operator validation of errors involvingfields that have dependency relationships with other fields, such as a“name” value that occurs in more than one page of a multi-page document,is facilitated by displaying the fields together, in a single screen,along with each fields corresponding document image snippet, even if thesnippets come from different pages. Likewise, as an operator iteratesthrough error fields in a table or other two dimensional data structure,corresponding snippets are displayed, even if they come from multiple,different pages. The human operator need only navigate through fields ina single data entry form, and the system transparently retrieves anddisplays for each field its corresponding snippet or other partialimage, without regard to page boundaries.

Former approaches to enable manual validation of data extracted from amulti-page document include field-level validation, a validationcommand, and rule-based validation techniques. In the field-levelvalidation approach, each field is evaluated independently. The samerule must be evaluated multiple times, which may trigger multipledatabase query round-trips. If the dependent fields are not contiguous,then the operator cannot fix the error in context, and must spend extraeffort to manually navigate back and forth between fields. This isespecially costly if the dependent fields are on different pages. Also,if each field is evaluated independently, it is very complicated todetermine the real reason for failure. In the validation commandapproach, a triggered command, such as an explicit button, is used toexecute the validation logic and mark all the invalid fields. However,in this approach the operator must explicitly activate the command;navigational assistance typically is not provided to make the operatormore efficient; and the operator may have made multiple changes betweencommands, so it is harder to determine which change caused a givenerror. Finally, in the rule-based approach, some applications may flagerrors on multiple parts due to a rule (e.g. compilers and tax returnsoftware), but given current demands of high-speed document capture, therule-based approach is insufficient because the operator must be able tofix the error very quickly. Rule-based validation may deal with errorvisualization, but not navigation and correction.

Using techniques disclosed herein, the navigation and visualization ofmultiple dependent errors can occur more quickly, resulting in increasedoperator task completion speed. In various embodiments, each rule isindependently constructed so the same field may be a dependency formultiple rules. For example, in an invoice table, price Xquantity=subtotal, and the sum of all sub-totals=grand total. If thesummation of totals is correct, then the error must be in the price orquantity. The user does not have to construct complex logic to determineif a given field is valid and its appropriate validation message. Rulesare automatically re-evaluated as the user updates a dependency field.This gives the user immediate notification and saves operator time insearching for additional errors.

Although the foregoing embodiments have been described in some detailfor purposes of clarity of understanding, the invention is not limitedto the details provided. There are many alternative ways of implementingthe invention. The disclosed embodiments are illustrative and notrestrictive.

What is claimed is:
 1. A method of validating data, comprising:extracting data values from digital content that is generated byrecognizing characters or marks from a multi-page document; generating adata entry form comprising the data values extracted from the digitalcontent; receiving an indication that one or more validation rulesassociated with two or more dependent fields in the data entry formcomprising the data values extracted from the digital content hasfailed; providing a human validation interface that enables an operatorto view the affected dependent fields and for each an associateddocument image portion from which a corresponding data value wasextracted, including by providing automated navigation to and display ofthe affected dependent fields; receiving operator input that changes adata value of one or more of the affected dependent fields, and inresponse to the operator input, re-evaluating a first validation rule ofthe one or more validation rules; and re-evaluating a second validationrule of the one or more validation rules affected by the changes to thedata value of said one or more of the affected dependent fields.
 2. Themethod of claim 1, wherein the affected dependent fields are associatedwith content that occurs on two or more different pages of themulti-page document.
 3. The method of claim 2, wherein the interfacepresents the dependent fields and associated data values and documentimage portions without requiring the operator to navigate manually to adifferent page or data entry form.
 4. The method of claim 1, wherein thehuman validation interface enables the operator to cycle through theaffected dependent fields sequentially.
 5. The method of claim 1,wherein the human validation interface displays the affected dependentfields and associated data values simultaneously.
 6. The method of claim5, wherein the human validation interface displays the affecteddependent fields and associated data values simultaneously and least inpart by generating dynamically and displaying to the operator asub-form.
 7. The method of claim 1, further comprising receiving adefinition of the validation rule.
 8. The method of claim 7, wherein thevalidation rule defines a dependency between the dependent fieldswithout reference to a page location within the multi-page document. 9.The method of claim 7, further comprising associating the validationrule with a document type with which the data entry form is associated.10. The method of claim 1, wherein the multi-page document is capturedby scanning a multi-page physical document.
 11. The method of claim 1,wherein the digital content is generated using one or more of OpticalCharacter Recognition (OCR) or Optical Mark Recognition (OMR).
 12. Themethod of claim 1, wherein the generating of the data entry formcomprises populating a single electronic form that corresponds to amulti-page document.
 13. The method of claim 1, wherein the generatingof the data entry form comprises determining relationships between oneor more data values extracted from digital content associated with afirst page of the multi-page document and one or more data valuesextracted from digital content associated with a second page of themulti-page document.
 14. A document capture system, comprising: adisplay device; and a processor coupled to the display device andconfigured to: extract data values from digital content that isgenerated by recognizing characters or marks from a multi-page document;generate a data entry form comprising the data values extracted from thedigital content; receive an indication that one or more validation rulesassociated with two or more dependent fields in the data entry formcomprising the data values extracted from the digital content hasfailed; provide via the display device a human validation interface thatenables an operator to view the affected dependent fields and for eachan associated document image portion from which a corresponding datavalue was extracted, including by providing automated navigation to anddisplay of the affected dependent fields; receive operator input thatchanges a data value of one or more of the affected dependent fields,and in response to the operator input, re-evaluate a first validationrule of the one or more validation rules; and re-evaluate a secondvalidation rule of the one or more validation rules affected by thechanges to the data value of said one or more of the affected dependentfields.
 15. The system of claim 14, wherein the affected dependentfields are associated with content that occurs on two or more differentpages of the multi-page document.
 16. The system of claim 15, whereinthe interface presents the dependent fields and associated data valuesand document image portions without requiring the operator to navigatemanually to a different page or data entry form.
 17. The system of claim14, wherein the processor is configured to display the affecteddependent fields and associated data values simultaneously, via thehuman validation interface, at least in part by generating dynamicallyand displaying via the display device a sub-form.
 18. The system ofclaim 14, wherein the validation rule defines a dependency between thedependent fields without reference to a page location within themulti-page document.
 19. The system of claim 14, further comprising amemory or other data storage device coupled to the processor andconfigured to store on or both of the data entry form and said datavalues extracted from the multi-page document.
 20. The system of claim14, wherein the processor is further configured to apply the validationrule and mark the affected dependent fields as having an error in theevent the validation rule fails.
 21. A computer program product tovalidate data, the computer program product being embodied in a tangiblecomputer readable storage medium and comprising computer instructionsfor: extracting data values from digital content that is generated byrecognizing characters or marks from a multi-page document; generating adata entry form comprising the data values extracted from the digitalcontent; receiving an indication that one or more validation rulesassociated with two or more dependent fields in the data entry formcomprising the data values extracted from the digital content hasfailed; providing a human validation interface that enables an operatorto view the affected dependent fields and for each an associateddocument image portion from which a corresponding data value wasextracted, including by providing automated navigation to and display ofthe affected dependent fields; receiving operator input that changes adata value of one or more of the affected dependent fields, and inresponse to the operator input, re-evaluating a first validation rule ofthe one or more validation rules; and re-evaluating a second validationrule of the one or more validation rules affected by the changes to thedata value of said one or more of the affected dependent fields.